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Abstract 

I present a method for lossy transform coding of digital audio 
that uses the Weyl symbol calculus for constructing the encoding and 
decoding transformation. The method establishes a direct connec- 
tion between a time-frequency representation of the signal dependent 
threshold of masked noise and the encode/decode pair. The formalism 
also offers a time-frequency measure of perceptual entropy. 

1 Introduction 

In lossy transform coding, a signal is transformed before re-quantization, and 
then partially recovered by applying the inverse transformation. In percep- 
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tual codecs, the goal is to make the necessarily introduced noise impercepti- 
ble. Mathematically, let ip be the original signal, K (for "key") be a linear 
transformation, and L = (for "lock") be its inverse. Then 

^jj' = KQLijj (1.1) 

is the recovered signal. Here, Q is a. quantization operator. It is not a linear 
operator but we can often model Q as introducing noise: 

Q(P = (P + aQX, (1.2) 

where X is a time series of uniformly distributed independent random vari- 
ables on the interval (—1/2, 1/2), and the constant aq is determined by the 
quantization scale. Hence, for the reconstituted signal as above, the intro- 
duced noise is 

agkX. (1.3) 

The noise is no longer white, but rather shaped by the operator K. 

A good psychoacoustic model will determine whether this noise is masked 
by ip, and a good lossy encoding algorithm will choose K so that it minimizes 
the combined storage requirements of the key K and the encoded signal 
QK~^tp, subject to the constraint that the introduced noise cannot be heard. 

In this paper, I extend the types of transformations to include pseudo- 
differential operators. In language of signal processing, a pseudo-differential 
operator on a sampled signal is a matrix with limited extent off its diagonal 
and a limited rate of change along the diagonal; one could also call it a 

^Blocking artifacts are difficult to analyze precisely because Q is not linear. 
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slowly evolving filter. As far as I can tell, the community has not used 
pseudo-differential operators in transform codecs, because, I would guess, 
they are not diagonal in any standard basis and are therefore more difficult 
to invert]^ The phase space theory of these operators, below, resolves the 
presumed difficulties and brings pseudodifferential operators into the realm 
of practical transforms]^ 

2 The Weyl symbol 

A symbol correspondence is a bijection between operators (here, on signals) 
and functions on the corresponding classical phase space (here, the time- 
frequency plane). The canonical symbol correspondence is the Weyl |2] sym- 
bol. It enjoys many properties that entitle it to be called "the" phase space 
representation of an operator, and is defined as follows. If A is an operator 
with t-space matrix elements {ti\A\t2), its Weyl symbol (sA) is a function of 

^For example, there is the following statement in [1 : "In order to perform well for 

most signals, however, the processing has to be applied to different parts of the frequency 

spectrum independently, since transient events are often present only in certain portions 

of the spectrum. This can be done using more complex hybrid filterbanks that allow 

for separate gain processing of different spectral components. In general, however, the 

interdependencies between the gain modification and the coder's perceptual model are 

often difficult to resolve." 

•^Be advised that the method is the subject of a provisional patent application to the 

United States Patent Office. 
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t and / defined by 

{sA){t,f) = j dse^^'f'{t + s/2\A\t~ s/2). (2.1) 

That is, the Weyl symbol is the Fourier transform of the matrix in its differ- 
ence variable. 

Some examples and properties of s: 

1. If / is the identity operator, then {sl){t, f) = 1. 

2. If A is diagonal in the t-representation with (t|v4|t) = a(t), then {sA)(t, f) = 
a{t). If A is diagonal in the /-representation, with diagonal elements 
b{f), then {sA){t, f) = b{f). This parallel is part of a larger metaplectic 
covariance of the formalism. 

3. Hip is a signal, then we can form the one dimensional projection opera- 
tor A = The Weyl symbol of this operator is called the Wigner 
[3] function, which, in signal processing, is a type of spectrogram. It is 
typically a rapidly varying function on phase space. Generally speak- 
ing, the Wigner function contains too much information to be of use 
for our purposes. However, various smoothings of it are valuable. 

The Weyl symbol allows us to regard operators as functions on phase 
space. However, in order to use these functions, we need to know what 
happens to operator multiplication. It becomes the star product, defined by 

{sA)i<{sB) = s{Ab). (2.2) 
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The star product acts generally through an integral kernel A, called the 
trikernel: 

{(sA) {sB)){z) = [ dzidz2isA){zi)isI3){z2) 

J (2.3) 

X /\{z,Zi,Z2), 

where z = {t, f) etc, and A, not needed elsewhere, is an exponential of the 
area spanned by the triangle with vertices {z,Zi,Z2). The trikernal formula 
is useful in some contexts, such as when 5 is a Wigner function, but is often 
too complicated to be of value. However, there is a class of functions we will 
call slowly varying, for which a simpler expression holds. (These functions 
are pseudo-differential operators.) 

If A and B are slowly varying symbols, then their star product can be 
expanded as series of bidifferential operators, called the Moyal [4J star prod- 
uct: 

{A ^ B){t, /) = A{t, /) exp (^^ B{t, f) (2.4) 

where the "Janus" operator 

^_ 1 ( d d _ d d 
2^ \dt~df ~ dfdi 

Here, derivatives topped with left (right) arrows act to the left (right, resp.) 

Conversely, if the Moyal series converges for A B, then they are slowly 

varying. 

In the case of extremely slowly varying functions, the star product is well- 
approximated by its leading term, the ordinary product. This is important: it 
means for the right operators, the Weyl transform maps complicated operator 
multiplication to simple ordinary multiplication. 
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How do we know a priori whether a function is slowly varying? One 
way is to consider sets of functions having rigorous bounds on the ratio of 
higher terms in the Moyal series to the leading term. On such set is the set 
of bounded variation. We say A{t, /) is of bounded variation, with length 
scales (at, a/), if 
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With this definition, it is easy to prove that if A and B are of bounded 
variation and 2TTataf ^ 1, then the Moyal series converges. In other words, 
the area of the characteristic scale of variation must be much larger than a 
Planck cell. This is a direct consequence of time-frequency uncertainty. 
We can create functions of bounded variation by using a sech kernel: if 

kit J) = -^sech f^-) sech (~) , (2.7) 
Aatttf \2atJ K^ajJ 

and A{t, /) is a positive-valued function, then the convolution o A is of 
bounded variation, with scales (aj, a/)|^ Note the convolution is on the whole 
phase space, rather than just in t or /. 

Finally, we introduce an important formula for the symbol of a function 
of an operator (the sofoo formula), i.e., sf{A). For example, given A or its 
symbol A = sA, we will need to know sA^^"^ and sA~^. Fortunately, the 
subject was considered at length by Gracia-Saz [5]. The general formula is 
quite complicated and expressed in terms of a series of diagrams. For our 
purposes, the important facts are these: first, which follows directly from the 
"^The normalization is chosen so that the convolution tends to the identity operator as 
(at,a/) 0. 
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Moyal product, that 

s/(i) = /M)+h.o.t., (2.8) 

and the higher order terms at right involve successively more derivatives 
of sA; second, that those derivative terms contain t and / derivatives in 
pairs. Thus, if A is of bounded variation, then the series for sf{A) is well- 
approximated by its first term. However, even A is not of bounded variation, 
then the series might still converge; in particular, we might imagine varying 
over phase space the smoothing scales (at, a/) while maintaining a constant 
product. 



The first correction term to Eq. (2.8) is 



1 1 /I /"(A),, , 



4(2^)2 V2 2! ^-"-^^ -^f' 

f"'(A) \ ^ ' 

+ ^-^{A',Afj + A)Au - 2A,AfA,f)'j . 

The diagrams in [5] and [6] help express this result more economically. This 
formula may find application below when the first term alone is not accurate 
enough]^ 



3 Phase space description of noise. 

Fully understanding a times series of random variables Y{t), requires knowing 

its entire joint probability distribution. However, for most purposes, it suf- 
^The situation is worse for other symbol correspondences, e.g., the normal ordered 
symbol for which s^^{tf) = —i/{2TT)tdt. For those symbols, the first correction term in 
the sofoo formula is of lower order; in terms of the scale of variation, it scales as l/(ata/) 
and is less easily ignored. 
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fices to study only the two point correlation function, that is, the expectation 
of a product at two times: 

E{Y{t,)Y{t,n (3.1) 

assuming the variables have individual means of zero. This expectation allow 
us to define a Hermitian noise operator TVy, whose t space matrix elements 
are as above, i.e., (tilTVylts) = E{Y{ti)Y{t2)*). 

We will characterize a given noise operator by its Weyl symbol. For 
example, the noise operator for white noise X{t) with unit variance is the 
identity operator, and hence its symbol is unity — which makes sense. The 
random variable series for time localized noise is y = WX, where W is a. 
window. It follows that the Weyl symbol for this noise is just W{ty. Colored 
noise, on the other hand, is defined by its Fourier transform: — WX, 
and its Weyl symbol is W{fy. 

The converse problem is to produce noise with a given (Hermitian) noise 
operator TV, and the solution is simple: defining Y — N^/'^X gives the desired 
noise, since 

^(|F)(F|) = N^I^E{\X){X\)N^'^ = TV. (3.2) 

Here we use for white noise that E{\X){X\) = I. As a practical matter, 
finding the square root of a given operator may not be so easy. However, if the 
operator has a slowly varying Weyl symbol N , the matter is straightforward: 
7V^/^ = s~^sN^/'^ Ri s~^A^^/^; that is, we simply take the square root of the 
noise operator's symbol, and convert it back to an operator using the inverse 
symbol. Note that this procedure does require that A'" is a positive function. 
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As a psychoacoustical matter, can we always describe noise by a slowly 
varying operator? Consider, for example, noise created from white noise 
by applying rapidly varying operators. In particular, the one-dimensional 
projector onto a signal ip is definitely not a slowly varying operator, 

and, when applied to white noise, gives a signal proportional to ip itself — the 
only randomness left is in the overall norm of the signal. Roughly speaking, 
the more rapidly an operator A varies, the more structure it imparts to a 
white noise signal X, and the less noisy the resulting signal sounds. 



4 Phase space setting for masking of noise. 

Psychoacoustical experiments of signals masking noise are consistent with the 
hypothesis that the maximal noise masked by a given signal is a slowly vary- 
ing noise operator. Masking experiments, except for those done informally 
in the testing of compression algorithms, are typically done in time or fre- 
quency, but not both. The classic paper by Ehmer shows masking curves 
of noise by pure tones. The curves typically peak at the tone frequency and 
fall off at a scale proportional to the frequency itself, but faster toward de- 
creasing frequency. Temporal masking experiments show pre-masking rising 
to a certain threshold under the signal, and decaying afterward. 

We can generalize these results to a broader hypothesis: For a given 
signal there exists a noise operator M^, such that AX is fully masked 
by ip whenever s{A^) is strictly less than = sM^. In other words, ip 
generates a phase space profile for the maximum allowed noise. 
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This phase space profile must be related to the phase space profile of 
itself — but how? We have already mentioned that the Wigner function 
is typically not useful for phase space analysis. To begin, it is not slowly 
varying. Moreover, it sometimes falls below zero. We can guess, however, 
that the masking noise profile of ■0 might be related to a smoothing of the 
Wigner function over phase space. One particular smoothing of the Wigner 
function gives another well-known phase space distribution — the coherent 
state representation. 

The coherent state representation C^{to, /o) is defined as follows. Using 
the moving gaussian window with width a defined by 



where is the Fourier transform. The coherent state representation depends 
on the parameter a, making it less canonical than the Wigner function; on the 
other hand, the finite width of the window makes it much easier to calculate. 
The coherent state representation, regarded as a symbol of an operator, is 
not slowly-varying, but it does vary more slowly than the Wigner function, 
and it is never negative. 

Our hypothesis, then, is that sM-^ is related to a smoothing of (which 
is itself a smoothing of s{\ip){ip\)) with normalization and width parameters 
determined by listening tests. The future full theory will take into account 
different masking widths at different frequencies as well as the statistical 




(4.1) 



we define 



c^{tj) = \{j^wtm)f, 



(4.2) 
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properties of C^, in order to account for the known assymmetry between the 
masking of noise by tones and the masking of noise by noise. 

In the meantime, I have explored simphfied theories that, though yielding 
sub-maximal phase space noise thresholds M^, nevertheless condemn to ob- 
scurity noise operators whose symbols fall below them. I will call such noise 
operators noise-confining operators; the goal for more sophisticated psycho- 
acoustical models will be an algorithm for generating the maximal noise- 
confining operator — however, as we shall see, a sub-maximal noise-confining 
operator can still be useful. 

Finding a noise-confining operator is straightforward. For a signal ip, I 



smoothed by convolving it with the sech kernels of Eq. (2.7) in order 
to produce an easily manipulated function of bounded variation. I used 
width parameters suggested by masking experiments. To test the theory, 
I took s^^(S'y^), and, as explained in section [sj applied this operator to a 
noisy signal x (a realization of the uniformly distributed random variable 
series X). I then listened to 

ij + as-\S]l^)x (4.3) 

and increased a to the threshold at which the above began to sound different 
from ip. By repeating this for different signals, and choosing the smallest a, 
I became confident that 

= a' s-\S];')\ (4.4) 

did indeed describe a noise-confining operator. 
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5 Phase space codec 

In the previous section, I introduced an explictly phase phase space setting 
for the signal dependent threshold of noise, and we will now use it to design a 
lossy transform codec. First, a note about normalization. I will assume that 
the original signal ip and the encoded signal ipencoded are both quantized on a 
unit scale. The quantization noise X present in ipencoded may, under certain 
conditions, be described as uniformly distributed on the interval (—1/2, 1/2) 
with variance 1/12. We have, for the encoded and restored signals, that 

i^encoded = QK~^i^ 

i'restored = Ktpencoded ^ 1p + KX. 



Comparing Eqs. (5.1) and (4.3), we set 



k = s-\M]:') (5.2) 

so that the noise introduced into iprestored is just at the threshold measured 
by the psychoacoustical model. 

I now present the argument showing how K reduces the average bit con- 
tent of ipencoded- I use an empirical observation that the values taken by 
ipencoded are Uniformly distributed over its range, but the argument does gen- 
eralize easily to more general distributions. If this assumption is true, we 



^Since ^pencoded IS determined from the psychoacoustical model, this method is inher- 
ently variable bit rate. 
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can estimate the average size of ipencoded from its variance: 

{'^ encodedl'^ encoded) ^ tr ( | 'ip encoded) {'^encoded \ ) 

= ^tr(M;^/^|V^)(^|M;^/^) (5.3) 
Jdtdf 

Here tr takes the trace of its operand, and, in the last hne we have used the 
traciahty property of the Weyl symbol, namely, that 

tr(ifi) oc jdtdfsAsB. (5.4) 

On the right, it is important to note the ordinary product appears rather 
than the star product. The traciality property converts the mean over t 



space into a mean over phase space. In the last of Eq. (5.3), we have divided 



by the phase space volume as a formal way to avoid worrying about the 



normalization factor in Eq. (5.4). Now, in the numerator integrand, the 
slowly varying function appears next to the rapidly varying Wigner 

function s{\tp){tp\). To a good approximation, then, we may replace the 
Wigner function by its average value within the variation scale of M^^. This 
average is, of course, 5*^. Thus, if we are working with the simplified model 
where = a'^S^, we find the expectation 

Eii^lncoded) = ^ 100. (5.5) 

Using the information theoretic definition of entropy we can convert this 
into a bit rate. Since we have not yet used that ipencoded is uniformly dis- 
tributed, we can afford to make a more general argument in which ipencoded-i 
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before quantization, takes its values from a probability density p{ip)dip. Quan- 
tization casts its values into bins i of width g(= 1), and the probability that 
(t) falls within the z'th bin is Pj, where 



(5.7) 



rq(i+l/2) 

Pi= d(f)p{(f)) ^ p{i), (5.6) 

Jq{i-l/2) 

where we have used q = 1- The entropy per sample is 

i 

Thus, if p{(j)) is uniformly distributed with standard deviation a 

S = log2 a + log2 2v/3 (5.8) 

Or, if p{(f)) is gaussian, 

S = log2 a + log2 a/ctt (5.9) 
Now, a itself is obtained from 0, leading to 

A[M^-MI^)(^I)], (5.10) 



a' 



where At denotes a phase space average over a time scale equal large enough 
to quell rapid variations in the result. This formula, when plugged into 



either Eq. (5.8) or Eq. (5.9), as appropriate, gives an expression for the time- 
dependent number of bits consumed by ipencoded- This formula evidently 
defines a phase space measure of perceptual entropy. 
When, as in our simple model, cr ^ 1/a, we find 

^ = log2 10 + log2 2 V3 ^ 5, (5.11) 
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so that the lossy stage of this encoding scheme takes no more than 5 bits per 
sample. 

As for the coding, we may again employ the considerable power of the 
sofoo formula and approximate 

L^k-'^s-'{M-'/^). (5.12) 

That is, we simply invert the masking threshold M^, take its square root, 
and apply the inverse Weyl symbol. This procedure ignores higher order 
terms in the exact expression for K^s inverse. If this is not accurate enough, 
we can always write the operator more accurately by using the higher order 
terms in the sofoo formula. (And this is okay, since time is the luxury of the 
coder.) 



6 Summary of the codec so far 

Through listening tests, we refine a phase space theory for the signal depen- 
dent threshold of noise. The outcome is a mapping from -0 to a noise operator 
M^. We define a key operator K — s'^M^"^ and send it off to a bit packing 
(entropy coding) algorithm for further compression. Using the symbol of a 
function of an operator formula, we define the lock operator L — s^^M^^^^, 
apply it to ip, quantize the result, and deliver it also for bit packing. This is 
the coding. As for decoding, we unpack the key and the encoded signal and 
then apply the key operator to it. 
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7 Practical issues and modifications 



In this section, we introduce two modifications which would have cluttered 
the earlier presentation. 

Existing perceptual codecs, in addition to exploiting masking phenom- 
ena, also use that much of the high frequency content is irrelevant because 
we cannot hear it anyway. This fact is easy to put into the phase space 
framework. Let H be the noise operator for the frequency dependent thresh- 
old of human hearing, i.e., the loudest colored noise that cannot be heard in 



any circumstances. We can then add H to in Eq.( 4.3) without changing 
how sounds. This suggests we take K = (^M^^'^ + H^^^^ . However, 
examining the formula for the iprestored, we see that this key introduces noise 
that, though inaudible, is independent of the signal itself, meaning that it 
carries no information. I have found that it works well to keep the H term 
in the lock, but drop it from the key. Two choices that work well for the lock 
are 

sL 



1/2 (7-1) 

If we use these locks, then even in the not-quantized case, the restored signal 
is different from the original. In the second lock above, it becomes 

i^restored = ( ^^"^ . ] 1p ■ (7.2) 

\M^ + HJ 

This expression bears similarity to a Wiener filter in |8]. 
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Using this type of lock can significantly increase the subsequent lossless 
compressibility of ip encoded- However, the improvement is not the same for 
all signals; those with significant high frequency content retain their original 
compressiblity. 

This brings us to the second deviation from the prior setup. So far, we 
have always written ipencoded in the time domain, but this is a problem because 
ipencoded IS not as comprcssiblc there. Even lossless compressors designed for 
the time domain, such as ones using linear prediction coefficients, do not 
perform as well as LZ or Huffman encoding in the frequency domain. I have 
therefore found it better to package the encoded signal in DFT'd chunks. 
To avoid the errors caused by quantizing twice, I delay the quantization 
of the encoded signal until after it has been Fourier transformed. This is 
valid because white noise is white in both the time and frequency domain. 
However, one must be careful to use a suitably large FFT. If the chunks 
are too small, frequency localization in the DFTs can cause the quantization 
noise assumptions to break down and introduce a noticeable warble to the 
decoded signal. I find that chunks of 512 or 1024 samples work best. In 
this format, standard compression programs (like gzip) reduce monophonic 
samples at 44kHz from 4 to 12 percent of their original size. However, this 
does not include the storing the key. 



17 



8 Storing the key 



The key spectrogram in this method takes the place of scale factor side 
information in standard lossy codecs. Naive lossless compression of a sampled 
kejj^ spectrogram yields disappointing results. Even though it is a smooth 
object with variations on the order of 40 times the minimum uncertainty 
scale A/ At = 1, there is simply too much overhead in storing values at every 
point, or differences between them — I have tried just about everything. In 
compression of monophonic samples, no such lossless method made the key 
file take less than 10 percent of the sample size. I have realized that, in order 
to make this method competitive, we must regard as only a suggestion that 
the key noise operator should be equal to the measured masking operator. 
Of course, if we make the key bigger than that operator, we will no longer 
be in the noise confined regime. Conversely, smaller keys sacrifice some of 
the available entropy. However, I have found that the key can be stored at a 
fractional accuracy of 10 percent without substantially introducing audible 
noise or degrading the compressibility. 

This lattitude allows us to store the key as an interpolated object where 
the value at each knot is specified with only one byte. Specifically, I have 
used an adaptive grid by allowing for variable time steps and then, for each 
selected time, sampling the slice at a time-specific frequency step size. The 
^ Of course, any key is a sampled key in this method. I usually sample the spectrogram 
at one half the variance of the coherent state window. I assume readers in this field are 
familiar with the transition from the continuous case, which I have presented here for its 
ease of elucidation, to the discrete case which occurs in practice. 
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step sizes for the adaptive grid are chosen as large as possible subject to the 
constraint that linear spline over it differs fractionally from the original key 
by no more than 10 percent. The step size information, together with the 
values at the spline knots, comprise a much smaller object: they reduce the 
overhead to less than 1 percent. One might think, given my earlier emphasis 
on using functions with bounded variation so that the sofoo formula applies, 
that the obvious discontinuities introduced by this method would cause the 
whole framework to fall apart. However, as is often the case in semi-classical 
analysis, we get more than we deserve using the final results of naive for- 
mal calculations: the method seems to work fine even with only piecewise 
smooth keys. If, however, in the future, these are found to introduce arti- 
facts, more sophisticated curve fits, such as cubic splines, could be developed, 
without, I think, sacrificing compressibility. An alternative would be to store 
an interpolated key with the understanding that it would be smoothed in a 
standard way after it is reconstituted; the practicality of such an approach 
would depend on the spare computational overhead in the decode routine. 

9 Conclusion 

It is clear that we perceive sound in a time frequency plane, simply because 
we hear pitch and rhythm. Thus, any psychoacoustic theory should achieve 
its most natural form in phase space. If I am correct that the maximal 
noise masked by a given signal is always characterized by a slowly varying 
(pseudo-differential) noise operator, then this codec can exploit any valid 
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psychoacoustical model. This makes it an attractive framework for directly 
translating advances in the phenomenology of masking into better lossy data 
compression. It also offers an interesting perspective on perceptual entropy. 

The main practical concern is the processing load of the main decoding 
loop. In early, fairly unoptimized code, the decode runs faster than real time 
by a factor of two. The decode loop is 0{N), but the coefficient is rather 
large — on the order of 500. Whether this loop can be implemented in real 
time on a portable device is beyond my expertise. 

I have not presented any suggestions for how this method develops in 
the stereophonic case. It presents many new and interesting issues, includ- 
ing psychoacoustical modeling of binaural masking effects and matrix- valued 
spectrograms. I leave these matters to a future paper. In the meantime, I 
can report that my early attempts at stereophonic compression — in which 
I seperately calculate left and right smoothed spectrograms, use them to 
transform the left and right channels, and then send the transformed mid 
and side channels to lossless compression — are transparent (informally) at 6 
to 13 percent overall compression ratios. It also works to form a single key 
from the mid channel and use it to encode both the mid and side channels. 

On the whole, I am encouraged by the performance at this early stage. 
The method is quite young, and it clearly has many refinements and tweaks 
ahead of it. Beyond that, the formalism emphasizes the value of phase space 
methods in the treatment of noise, masking phenomena, and the measure- 
ment of perceptual entropy. 
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A Entropy of phase space noise 

I am not sure how the following argument fits in with the earlier entropy 



result of Eq. (5.10), but it is yet another interesting application for the sofoo 



formula. The entropy 5 of a series of random variables Yi is 

S = - [ lldY,PiY)\og,PiY), (A.l) 

where P{Y) = P(Yi, . . . , Y^) is the joint probability density. If Y = MX, 
and X is uncorrelated white noise, then we can use that P{Y) transforms as 
a one form to conclude that 

S = - ffldXi log2 (det M"^) . (A.2) 

The log2 term being constant, we can integrate out the p.d.f for X and resume 

as 

S = log2 (det m) = log2 2*'"i°S2 M 

dt df s(log2 M) ^ dt df log2 sM, 



where we have used the traciality property and, in the last, assumed that 
M is slowly varying. In this context, that ip can tolerate the addition of 
noise s means that it belongs to an ensemble of identical sounding 

signals, and its information content goes down by the above result. This 
result differs from the previous in that the original ip does not appear, and 
the log2 is inside the integrand. 
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B Encoding already noisy signals 

Suppose already sounds noisy. Then, if the statistics of ipencoded are correct, 
then the noisy part Kipencoded may suffice for a reahstic sounding reconstitu- 
tion of ip. The argument goes as follows. As usual. 



(B.l) 



'4'encoded — Ll/j 
i^restored = Klp^ncoded = KLii + KX. 

We set 

V' = i^restored = K Lijj + KX (B.2) 

and use that ip sounds noisy to approximate 

V' = kX, (B.3) 

leading to 

k = KLk + k. (B.4) 
Inspection of this equation leads us to guess that 

K — akk 
L — aik~^, 
so that 

1 = akOii + ak- (B.6) 

This one equation does not determine these two proportionalities. We fix 
this by requiring that K be as large as possible, so that ipencoded be as small 
as possible. This implies ai — 0. Thus, when a signal is already noisy, we can 



(B.5) 
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take L — and sK^ ^ S^. In this extreme case, the entire signal information 
is contained in the key. Of course, with L = 0, ipencoded = 0, and, in order 
that the reconstituted signal sound at all, we need to dither white noise into 
ipencoded- Real signals will contain a fraction of noise and purer tones, so this 
extreme case will rarely actually occur; nevertheless, the argument shows that 
noise can help us increase the overall key scale, and hence the compression 
ratio for ip encoded- The argument also shows us another case where the lock 
is not the key's inverse. 
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